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Abstract 

Background: Meiotic recombination ensures proper segregation of homologous chromosomes and creates 
genetic variation. In many organisms, recombination occurs at limited sites, termed 'hotspots', whose positions in 
mammals are determined by PR domain member 9 (PRDM9), a long-array zinc-finger and chromatin-modifier 
protein. Determining the rules governing the DNA binding of PRDM9 is a major issue in understanding how it 
functions. 

Results: Mouse PRDM9 protein variants bind to hotspot DNA sequences in a manner that is specific for both 
PRDM9 and DNA haplotypes, and that in vitro binding parallels its in vivo biological activity. Examining four 
hotspots, three activated by Prdm9 Cst and one activated by Prdm9 Dom2 l we found that all binding sites required the 
full array of 1 1 or 12 contiguous fingers, depending on the allele, and that there was little sequence similarity 
between the binding sites of the three Prdm9 Cst activated hotspots. The binding specificity of each position in the 
Hlx1 binding site, activated by Prdm9 Cst , was tested by mutating each nucleotide to its three alternatives. The 31 
positions along the binding site varied considerably in the ability of alternative bases to support binding, which 
also implicates a role for additional binding to the DNA phosphate backbone. 

Conclusions: These results, which provide the first detailed mapping of PRDM9 binding to DNA and, to our 
knowledge, the most detailed analysis yet of DNA binding by a long zinc-finger array, make clear that the binding 
specificities of PRDM9, and possibly other long-array zinc-finger proteins, are unusually complex. 
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Background 

Genetic recombination is an essential feature of meiosis, 
assuring an appropriate segregation of chromatids at the 
first meiotic division, and generating an evolutionarily 
important source of genetic variation by providing new 
arrangements of alleles between genes linked on the same 
chromosome. In many organisms, notably yeast [1], higher 
plants [2], and mammals including humans and mice 
[3-5], recombination is concentrated along chromosomes 
at limited sites known as 'hotspots'. Typically a kilobase in 
extent, hotspots are surrounded by long stretches of DNA, 
tens to hundreds of kilobases in extent, that are essentially 
devoid of recombination in humans and mice. 
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Recently, several groups have shown that PR domain 
member 9 (PRDM9), a zinc-finger (ZF) protein with his- 
tone 3 lysine 4 (H3K4) methyltransferase activity, plays a 
key role in determining the locations of hotspots in both 
mice and humans [6-8]. It is presently proposed that 
PRDM9 binds to appropriate DNA sequences in meiotic 
chromatids, generates activated chromatin by virtue of its 
H3K4 methyltransferase activity, and somehow guides the 
generation of double-strand breaks (DSBs) at those sites 
by the topoisomerase-like protein SPOll [9]. Analyses of 
individual human hotspots and a number of genome-wide 
studies have implicated PRDM9 as the predominant regu- 
lator of hotspot placement [6,7,10,11]. In mice, analyses of 
genome-wide hotspots of DSB formation [12] make it 
clear that PRDM9 determines the location of virtually all 
hotspots, with the clear exception of the obligate crossover 
at the pseudoautosomal region, at which recombination 
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occurs equally well with different variants of PRDM9 pre- 
sent, or indeed no variant at all. There is also evidence 
that Prdm9 participates in transcriptional regulation 
[13,14], which may be related to its involvement in hybrid 
sterility [14]. 

Although PRDM9 plays a very important role in mam- 
malian recombination, there is considerable uncertainty 
as to how it physically determines hotspot locations and 
then directs DSB formation there rather than at other tri- 
methylated H3K4 (H3K4-me3) sites, of which there are 
many. Meeting these challenges has repercussions both 
for expanding the current understanding of recombina- 
tion biology and for the insights this could provide in 
understanding the functions of other regulatory proteins 
with ZF arrays. More than 4% of human protein-coding 
genes contain ZF arrays, and half of those arrays are 
comparable in size with those present in PRDM9 [15-17]. 

The identification of PRDM9 as a regulator of human 
recombination [7] relied on the finding that the DNA 
sequence predicted to bind to the ZF array of the most 
common human variant (variant A) matched a 13 bp 
consensus sequence that characterizes 41% of human 
hotspots [18], and the fact that although this sequence 
is present in chimpanzee DNA, it does not characterize 
chimpanzee hotspots. Baudat et al. [6] correlated human 
hotspot activity with human allelic variation at PRDM9, 
and predicted a mouse PRDM9 DNA binding sequence 
found in a mouse hotspot. Parvanov et al. [8] identified 
Prdm9 in mice by genetically mapping the gene control- 
ling hotspot activity to a 181 Kb interval containing four 
genes, three of which could be excluded as candidates. 

Although PRDM9 reportedly binds to hotspot DNA 
[6,19], there is still considerable confusion about the nat- 
ure of the DNA sequences recognized by the ZF array of 
PRDM9 and how this protein achieves its locational spe- 
cificity. Many more copies of the human consensus 
sequence are found in the genome at non-hotspot sites 
than at the hotspots themselves [18], and Berg et al [10] 
showed that human hotspots possessing or not posses- 
sing this consensus sequence are equally dependent on 
PRDM9. In mice, the consensus sequence originally 
predicted for the mouse PRDM9 Cst variant [6] is more 
commonly present in non-hotspot than in hotspot 
regions, and the ZF prediction programs used in various 
studies [20-22] also predict that the mouse Dom2 variant 
of PRDM9 (PRDM9 Dom2 ) should bind to hotspots that 
genetic studies have shown are not activated by this 
allele. Nevertheless, when tested experimentally, longer 
oligonucleotides containing buried sequences matching 
the predicted recognition motifs of two human PRDM9 
alleles have been shown to bind PRDM9 protein 
expressed in cell cultures [6,19]. 

To address the DNA binding problem experimentally, 
we expressed the mouse Prdm9 Cst allele (from the 



CAST/EiJ strain; referred to below as CAST) and the 
Prdm9 Dom2 allele (from the C57BL/6J strain; referred to 
below as B6) in Escherichia coli. and determined the 
abilities of their respective protein products to bind DNA 
sequences from three mouse hotspots (Hlxl, Esrrgl, and 
Psmb9) known from genetic evidence to be activated by 
Pdrm9 Cst and one (Pbxl) known to be activated by 
Prdm9 Dom2 . Binding between expressed PRDM9 and 
DNA was tested both by the variant-specific ability of 
PRDM9 to modify the electrophoretic mobility of target 
DNA sequences and conversely, by the ability of DNA 
binding-site sequences to physically sequester PRDM9. 

Results 

Histone H3K4 trimethylating activity 

As an indication that the PRDM9 expressed in E. coli is 
native protein, we first confirmed that it retains its his- 
tone trimethylating activity. Both the induced 
PRDM9 Dom2 and the PRDM9 Cst E. coli extracts showed 
enhanced H3K4 trimethylating activity compared with 
uninduced extracts, and induced empty vector (see 
Additional file 1, Figure SI). 

Fine mapping of Prdm9 Dom2 and PRDM9 Cst binding sites 

The Hlxl and Esrrg-1 hotspots, described in our pre- 
vious work [8,23], require activation by the Prdm9 Cst 
allele, as does Psmb9 [6]. We concluded that the Pbxl 
hotspot is dependent on the Prdm9 Dom2 allele because it 
was only active in crosses involving B6 and not active in 
crosses lacking this genetic background (see Additional 
file 2, Figure S2A). 

Pbxl had a single, PRDM9 Dom2 allele-specific binding 
site located at the left end of the interval tested (Figures 
1A, B; see Additional file 2, Figure S2B). The shortest 
Pbxl oligo that showed binding to this allele, which has 
12 contiguous ZFs, was 34 bp long (Figure 1C; see Addi- 
tional file 2, Figure S2C). This is close to the 36 bp that is 
predicted if all of the ZFs bind their expected 3 bp 
sections. The binding site contained seven strong 
matches to the binding site predicted for PRDM9 Dom2 by 
the linear Persikov-Singh algorithm [22] when aligned to 
fingers 1 to 11 (Figure ID), counting the 15 nucleotides 
that are found at > 60% frequency at each position. 
However, this was not statistically significant compared 
with the 3.75 matches expected by chance (P = 0.057 by 
a binomial distribution). The Pbxl binding site is posi- 
tioned 6 bp distal to the single-nucleotide polymorphism 
(SNP) between the B6 and CAST strains within the 
hotspot (see Additional file 2, Figure S2D). Surprisingly, 
although this hotspot has a single binding site, the loca- 
tions of the genetic crossovers within this hotspot depend 
on the nature of the genetic cross used in its detection. In 
the case of the B6xCAST and B6xPWD/PhJ interstrain 
crosses, where the recombining mice are heterozygous 
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Pbxl*(75 bp) + 
B Uninduced PRDM9 Cst - 



Induced PRDM9 Cst __ + + + + _____ 

Pbxl cold --- + -- -- + -- 

Non-competitor DNA ----+----+- 

Uninduced PRDM9 Dom2 ------ + -- -- 

Induced PRDM9 Dom2 _______ + + + + 



Fraction shifted 0.02 0.02 0.03 0.02 0.03 0.0 0.02 0.34 0.03 0.28 0.0 

Pbxl* (75 bp) ++++++ 
Induced PRDM9 Dom2 + + + + + + 
Competitor (bp) - 75 29 31 33 34 



D Fraction shifted 0.13 0.01 0.06 0.06 0.06 0.03 

Pbxl (34 bp) GAGAACTTACAAAGGTAGGACGTAAATGGAGTGT 

Inferred motif GTgTC . TGqTGcT^qT. a -Q«~.-^ . si , . TTqTG... 
PRDM9 Dom2 ZnF I^dIH l"Qpin fo^ l~Q^ 

Figure 1 Mapping the PRDM9 Dom2 binding site of Pbxl. (A) Scheme of Pbxl hotspot with polymorphisms and amplicons for initial testing 
of PRDM9 Dom2 binding. The red line shows the position of the single binding site detected at this hotspot. (B) Pbxl specifically binds to 
PRDM9 Dom2 but not PRDM9 Cst . The compositions of the binding reactions are shown above each lane. Red arrow, shifted band; black arrow, 
unbound fragment. (C) Detailed mapping of the Pbxl binding site. The compositions of the binding reactions are shown above each lane. For 
the sequences of the competitor oligos used for final mapping, see Additional file 1 1 (Additional experimental procedures). Red arrow, shifted 
band; black arrow, unbound fragment. (D) The sequence of the 34 bp oligo representing the minimal binding site is aligned to the inferred 
PRDM9 Dom2 binding motif [6,21] and the zinc-finger array of PRDM9 Dom2 (amino acids at positions -1, 3, and 6 relative to the a-helix of each 
finger are shown). The strong matches of the binding site to the motif are in red. 



across the entire genome, genetic crossovers were distrib- 
uted on both sides of the binding site. However, in the 
case of the B6xB6. CAST-IT congenic cross, where the 
Fl mice are heterozygous only for the distal 100 Mb of 
chromosome 1 and are homozygous across the rest of 
the genome, genetic crossovers were located to one side 
of the binding site (see Additional file 2, Figure S2A). 

Hlxl also possesses a single, variant-specific binding 
site, this time located at the middle of the genetic interval 
defining the hotspot (Figure 2A, B; see Additional file 3, 
Figure S3 A). The shortest Hlxl oligo showing binding to 
PRDM9 Cst was 31 bp long (Figure 2C, left panel), close to 
the 33 bp predicted if all 1 1 contiguous ZFs in this allele 
bind their expected 3 bp. Both B6 and CAST sequences 
showed nine matches to the binding site predicted for 
PRDM9 Cst by the linear Persikov-Singh algorithm, but at 



different positions, and this was significant (P = 0.00034) 
compared with the 3.5 matches expected by chance. 

The B6 and CAST sequences at the Hlxl binding site 
differ at three positions (Figure 2D; see Additional file 3, 
Figure S3B), of which two out of three affect binding 
affinity and crossover rate in parallel at this hotspot. 
Although both the B6 and CAST sequences bound 
PRDM9 Cst , binding by the B6 sequence was about 2.9 
times stronger than that by the CAST sequence (Figure 
2C, right panel). This corresponds with our previous 
demonstration that initiation of meiotic recombination 
at Hlxl is about 2.5 times more frequent on the B6 
chromosome than on the CAST chromosome [5]. A 
recent estimate of Hlxl by another method found about 
a four-fold difference between the B6 and CAST 
sequences using 41 bp oligonucleotides [19]. 
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Hlxl 



Hlxl* (75 bp) 
Induced PRDM9 Cst 
Induced PRDM9 Dom2 
Induced vector 
Cold competitor 



Hlx Esr Psmb Pbx 



B Hixi* (75 bp) + + + + + _ + + + + _ -► •••••• at *m 

Uninduced PRDM9 Cst - + --------- Fraction shifted 0.0 0.35 0.03 0.02 0.06 0.03 0.28 0.01 

Induced PRDM9 Cst - - + + + + -- -- - 

Hlxl cold --- + -- -- + -- 

Non-competitor DNA ----+----+- 

Uninduced PRDM9 Dom2 ------ + -- -- 

Induced PRDM9 Dom2 _______ + + + + 

< ■ I 



Fraction shifted 0.0 0.0 0.34 0.07 0.41 0.0 0.01 0.01 0.01 0.01 
0.0 

Hlxl* (75 bp) +++++++ B6 Hlxl*(75 bp) + 

Induced PRDM9 Cst +++++++ CAST Hlxl* (76 bp) - + 

Competitor (bp) - 30 31 32 33 36 75 Induced PRDM9 Cst + + 



Fraction shifted 0.66 0.62 0.11 0.12 0.08 0.08 0.05 0.47 0.16 

D Hlxl (31 bp)C57BL/6J - AGTGTGCAGACTTGGACCCTGCCCTTTCTTT 
Hlxl (31 bp)CAST/EiJ - AGTGTTCAGACTTGGACTCTGCCCTTCCTTT 

Inferred motif GlSsI^Id^lGcsI^QaCsI^T^CsI^TlclG^ 

Figure 2 Defining the PRDM9 Cst binding site of Hlxl. (A) Scheme of Hlxl hotspot with polymorphisms between the C57BL/6J (B6) and the 
CAST/EiJ (CAST) mouse strains (from Paigen et al. [5]) and amplicons for initial testing of PRDM9 Cst binding. The positions of single-nucleotide 
polymorphisms (SNPs) were taken from the National Center for Biotechnology Information (NCBI) build 37. The red line represents the only 
fragment showing binding at Hlxl . (B) Hlxl specifically binds to PRDM9 Cst but not PRDM9 Dom2 . The compositions of the binding reactions are 
shown above each lane. Red arrow, shifted band; black arrow, unbound fragment. (C) Detailed mapping of the Hlxl binding site. The 
compositions of the binding reactions are shown above each lane. For the sequences of the competitor oligos used for final mapping, see 
Additional file 1 1 (Additional experimental procedures). (Left panel) Definition of the minimal binding site; (right panel) Strength of binding of 
PRDM9 Cst to B6 and CAST sequences of the Hlxl binding site. Red arrow, shifted band; black arrow, unbound fragment. (D) Sequences of the 
Hlxl minimal binding sites in B6 and CAST strains aligned to the inferred PRDM9 Cst binding motif. Sequence differences are underlined. The 
strong matches of the binding sites to the motif are in red. (E) Hlxl competes with the other PRDM9 Cst -dependent binding sites. The 
compositions of the binding reactions are shown above each lane. Red arrow, shifted band; black arrow, unbound fragment. 



Esrrg-1 had a single, variant-specific binding site, also 
located near the middle of the genetic interval identifying 
this hotspot (Figure 3A; see Additional file 4, Figure 
S4A). This binding site lacked any sequence differences 
between B6 and CAST (see Additional file 4, Figure S4B). 
The minimum length of the binding site for the Esrrg-1 
hotspot was 33 bp (Figure 3C), and there were only four 
strong matches to the binding site predicted for 
PRDM9 Cst (Figure 3G), which was not different from the 
chance expectation (P = 0.31). 

The Psmb9 hotspot has been identified and geneti- 
cally characterized previously [24]. We tested only its 



central region, where the PRDM9 binding reportedly 
occurs [19] (Figure 3D), confirming this prediction. 
Testing the reduced oligos, we found a 30 bp minimal 
binding site showing allele specificity (Figure 3F). Its 
best alignment to the predicted site had eight strong 
matches (Figure 3G), which is likely to be significant 
CP = 0.00098). 

Comparison of PRDM9 Cst binding sites 

The lengths of the three binding sites (30 to 33 bp) sug- 
gests that binding involves all of the Zn fingers in the 
array, with the possible exception of either the first or 
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Esrrg-1 



Psmb9 



CT>~ 



CO 

s 



Esrrg-1* (80 bp) 
Induced PRDM9 Cst 
Induced PRDM9 Dom2 
Induced vector 
Cold competitor 



Hlx Esr Psmb Pbx 



Psmb9* (80 bp) 
Induced PRDM9 Cst 
Induced PRDM9 Dom2 
Induced vector 
Cold competitor 



Hlx Esr Psmb Pbx 



Fraction shifted 0.0 0.53 0.02 0.02 0.03 0.03 0.31 0.03 

Esrrg-1* (80 bp) + + + + + 
Induced PRDM9 Cst + + + + + 
Competitor (bp) - 80 30 33 36 



Fraction shifted 0.0 0.38 0.02 0.02 0.02 0.08 0.34 0.03 



Fraction shifted 0.40 0.08 0.25 0.12 0.12 



F Psmb9* (80 bp) 
Induced PRDM9 Cst 
Competitor (bp) 



80 30r 28r 25r 321 301 



Fraction shifted 0.50 0.09 0.27 0.28 0.53 0.16 0.19 



Esrrg-1 (33 bp) - ATACTTTGCAAATATCAAGGCTCTAATACAAAT 
Psmb9 (30 bp) - ATCCAGGGAATAGAACTTTGACCATTACCC 
Inferred motif GlGsI^Id^lGcsI^Q^csT^IftCsT^ 



Figure 3 Defining the PRDM9 binding sites of Esrrg-1 and Psmb9. (A) Scheme of Esrrg-1 hotspot with polymorphisms between C57BL/6J 
(B6) and CAST/EiJ (CAST) and amplicons for initial testing of PRDM9 Cst binding. The red line represents the only fragment showing binding at 
this hotspot. (B) Esrrg-1 specifically binds to PRDM9 Cst but not PRDM9 Dom2 and competes with the other PRDM9 Cst -dependent binding sites. The 
compositions of the binding reactions are shown above each lane. Red arrow, shifted band; black arrow, unbound fragment. (C) Detailed 
mapping of the Esrrg-1 binding site. The compositions of the binding reactions are shown above each lane. For the sequences of the 
competitor oligos used for final mapping, see Additional file 1 1 (Additional experimental procedures). Red arrow, shifted band; black arrow, 
unbound fragment. (D) Scheme of Psmb9 hotspot with polymorphisms between B6 and CAST, and amplicons for initial testing of PRDM9 Cst 
binding. The approximate position of the Psmb9 binding site was reported previously [19], and therefore only three amplicons surrounding it 
were tested. Only the middle fragment showed binding to PRDM9 Cst . (E) Psmb9 specifically binds to PRDM9 Cst but not PRDM9 Dom2 and 
competes with the other PRDM9 Cst -dependent binding sites. The compositions of the binding reactions are shown above each lane. Red arrow, 
shifted band; black arrow, unbound fragment. (F) Detailed mapping of the Psmb9 binding site. The compositions of the binding reactions are 
shown above each lane. For the sequences of the competitor oligos used for final mapping, see Additional file 1 1 (Additional experimental 
procedures). Red arrow, shifted band; black arrow, unbound fragment. (G) Sequences of the Esrrg-1 and Psmb9 minimal binding sites aligned 
with the inferred PRDM9 Cst binding motif The strong matches of the binding sites to the motif are in red. 



the last finger, noting that the first finger has an ECH2 
configuration rather than C2H2. 

There is very little similarity between the sequences of 
the three PRDM9 Cst binding sites. Any pairwise matches 
in sequence for the three sites were distributed over the 
entire length of the binding sites. The best alignment of 
the three minimal binding sites for the PRDM9 Cst ZF 
array identified only four conserved positions (P = 0.052) 
(see Additional file 5, Figure S5, red); two would have 
been expected by chance. These triple matches are 
located at positions predicted to bind the second, fourth, 



sixth, and eighth fingers of PRDM9 st . Pairwise compari- 
sons identified 15 matches between Hlxl and Esrrg-1 
(see Additional file 5, Figure S5, red and green), 12 
matches between Hlxl and Psmb9 (see Additional file 5, 
Figure S5, red and yellow), and only 6 matches between 
Psmb9 and Esrrg-1 (see Additional file 5, Figure S5, red 
and cyan). We examined this level of similarity using two 
statistical methods. Using the % 2 test, the probability that 
the number of nucleotide matches between the three 
sites (scored as 0, 2, or 3 matches at each position) 
exceeds chance was 0.048 (barely significant), and this 
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significance entirely disappeared when we considered only 
the 17 positions where the mutation analysis (see below) 
indicated the strongest evidence for nucleotide specificity. 
Additionally, using the binomial distribution for pairwise 
comparisons, we found some support for similarity 
between Hlxl and Esrrg-1 (P = 0.003) and between Hlxl 
and Psmb9 (P = 0.048), but not between Psmb9 and 
Esrrg-1 (P = 0.75). Given this marginal sequence similarity 
between the three PRDM9 Cst binding sites, we confirmed 
by testing their ability to compete with each other for 
binding that they do in fact bind to the same molecular 
entity. The sites do compete with each other, and the data 
suggest that Hlxl and Psmb9 both bind more strongly 
than Esrrg-1 (Figure 2E; Figure 3B, E). There was also no 
consistency in the binding specificity of the same ZF 
located at different positions in the array. The amino acids 
at positions -1, 2, 3, and 6 in each finger are thought to 
make contact with the DNA helix and determine binding 
specificity [21,22,25]. These amino acids are identical 
(ASNQ), as are all the rest of the amino acids in fingers 2, 
5, 7, and 9 of the PRDM9 Cst Zn finger array, and all 11 
fingers in this array contain serine at the -2 position, 
which is also thought to contribute to specificity. Never- 
theless, fpr the three PRDM9 Cst binding sites, there is no 
consistency in the four triplets that these four fingers bind 
(see Additional file 5, Figure S5). A similar result was seen 
for the nucleotides binding the pairs of identical fingers in 
the PRDM9 Dom2 variant at the Pbxl hotspot. 

In vivo H3K4 trimethylation sites 

The locations of the DNA binding sites for all hotspots 
were within the regions representing the peak locations 
of meiotic histone H3K4-me3 marks that result from 
PRDM9 binding; these were measured in mouse testes 
by chromatin immunoprecipitation (ChIP) using an 
antibody against H3K4-me3 (Figure 4). The same pat- 
tern has been previously reported for Psmb9 [19,26]. It 
should be noted that the histone modification occurs 
over appreciable distances, a kilobase or more from the 
actual binding site itself, indicating that H3K4-trimethy- 
lation at hotspots involves nucleosomes beyond those 
immediately adjacent to the binding site. 

Computer predictions 

The algorithms developed by Persikov and Singh [22] 
are commonly used to predict ZF DNA binding 
sequences. These include both a linear prediction pro- 
gram based on an elemental identity of trinucleotide 
binding sequences for each finger, and a polynomial 
form of the program that takes nucleotide interactions 
into account. In every case, when both programs were 
tested against 200 bp sequences surrounding the four 
identified binding sites, the programs identified multiple 
binding sites within the 200 bp sequence. In only two of 



the eight tests was the actual binding site the top scor- 
ing site; in two cases, the program failed to identify the 
binding site at all, and the quality of prediction declined 
noticeably when tested against longer DNA sequences. 
The performance of the linear and polynomial predic- 
tion programs differed for each hotspot. Only the poly- 
nomial program correctly identified the Hlxl binding 
site, whereas only the linear program identified Pbxl. 
Both programs identified Esrrg-1 and Psmb9 (see Addi- 
tional file 6, Table SI). 

As an alternative, we used a position-weighted matrix 
[27] derived from the detailed binding requirements of 
the Hlxl hotspot (see Additional file 7, Table S2) to 
scan for DNA binding sites that coincide with geneti- 
cally determined hotspots on mouse chromosomes 1 
and 11 [5,28]. Unfortunately, we failed to find a DNA 
binding motif common to genetically identified hotspots 
on these chromosomes. 

Mutational characterization of the Hlxl binding site 

Given the diversity of the PRDM9 Cst binding sites, we 
chose the Hlxl DNA sequence to further characterize 
the binding specificities of PRDM9 and to determine the 
nucleotide specificity of each position in the binding site. 
Because it was not possible to test all 10 19 combinations 
of base pairs in a 31 bp oligo, we used a competition 
assay to determine how binding was affected when each 
base pair was replaced in turn by its three possible alter- 
natives. For this, we first determined the kinetics of repla- 
cement of an already bound double-stranded oligo by a 
competitor (see Additional file 8, Figure S6A). The repla- 
cement rate was very slow, indicating very tight binding; 
reduction of the signal by unlabeled competitor was 
apparent only after 4 hours of incubation. This made 
possible a competition assay in which unlabeled mutated 
oligos were pre-incubated with expressed PRDM9 Cst for 
1 hour, then one-twentieth the number of molecules of 
labeled oligo of the original sequence were added, and 
the mixture was incubated for an additional 4 hours. At 
the end of the incubation, the extent of binding of the 
labeled oligo was tested by using the electrophoretic 
mobility assay (see Additional file 8, Figure S6B). Because 
the presence of a biotin tag at the end of the minimum 
binding sequence diminishes binding affinity, the labeled 
oligo was 75 bp long, with the binding site located in the 
middle, while the competing oligos were 36 bp long, 
including two additional base pairs at the 5' end and 
three additional base pairs at the 3' end relative to the 
minimal 31 bp binding site. In this assay, the ability of a 
mutated oligo to reduce subsequent binding by labeled 
oligo provided a measure of its ability to bind PRDM9; 
appearance of a strong signal band indicated that the 
unlabeled mutated oligo failed to bind PRDM9, a weaker 
signal indicated lower binding strength, and lack of any 
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Esrrgl 




Distance From Binding Site (kb) 

Figure 4 H3K4-me3 marks are enriched near the binding sites of the hotspot tested The peak of H3K4-me3 at hotspots is centered near 

the PRDM9 binding site. Chromatin was prepared using spermatocytes from mice 12 days post-partum and subjected to chromatin 

immunoprecipitation (ChIP) with antibody directed to H3K4-me3 or normal rabbit IgG. Quantitative PCR was performed for 8 to 9 amplicons 

distributed across about 10 kb surrounding the hotspot on immunoprecipitated chromatin and an equal amount of MNase-treated, undiluted 

input DNA to calculate the fraction of chromatin bound at each amplicon. Blue line, B6 x CAST F1; green line, B6; red line, rabbit IgG (negative 

control). An identical distribution of H3K4-me3 marks for Psmb9 was shown previously by Grey et al. [19]. 
k J 



signal indicated very strong binding. The results of these 
tests are presented (Figure 5A; see Additional file 8, 
Figure S6C), and summarized in graphical form (Figure 
5B), where we assumed the 5'-3' orientation of the DNA 
to be relative to the N-C orientation of the ZF array, 
based on its much better fit to the computer-predicted 
binding sequence, rather than the reverse. 
The 31 positions of the Hlxl site varied greatly in their 
specificity. As a simple measure of the degree of specifi- 
city, we calculated the standard deviation (SD) of the 
measured binding affinities at each position (see 



Additional file 9, Table S3). Using this indicator of speci- 
ficity, the positions fell into three groups: 13 of low speci- 
ficity (SD = 0.03 to 0.12), 10 of moderate specificity (SD 
= 0.16 to 0.36), and 8 with the highest specificity (SD = 
0.45 to 0.52), located at fingers 4, 5, and 6 near the center 
of the binding site. The weakest specificity was at the C- 
terminal end, fingers 9 to 11, corresponding to positions 
25 to 33 of the binding site. Given that these fingers are 
nevertheless required for binding, it is possible that they 
function in a non-sequence-specific manner, for example 
by stabilizing the contact between PRDM9 and DNA 
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PRDM9 Cst ZnF 

Figure 5 Nucleotide substitution analysis of Hlx1-B6 binding to PRDM9 Cst . (A) Competition assay for testing the binding of PRDM9 Cst to 
mutated Hlx1 binding sites. Substitutions of nucleotides 4 to 6; labeled oligos are indicated by asterisk. The compositions of reaction mixtures in 
the first three lanes are shown above each lane. Lanes 4 to 12 show the competition test with mutated oligos. The position and the nature of 
each mutation in the oligo are shown. The composition of the reaction mixtures in lanes 4 to 12 is mutated oligo + PRDM9 Cst + oligo* (see 
Materials and methods). The fraction of the shifted band is indicated below each lane. Red arrow, shifted band; black arrow, unbound fragment. 
(B) Graphic representation of the binding changes at each position produced by the nucleotide substitution analysis. (C) ZF domain of 
PRDM9 Cst . The letters in the boxes show the amino acids at positions -1, 3, and 6 relative to the a-helix of each finger. 



through charge interactions involving the phosphate 
backbone. In this regard, it is interesting that these 
fingers seem to be less subject to evolutionary selection; 
the last two fingers are invariant between mouse alleles, 
and the next has only a single amino acid substitution, N 
for V. Replacement of nucleotides 1 to 2 and 34 to 36, 
adjacent to the shortest 31 bp sequence (see Additional 
file 8, Figure S6C), also affected binding affinity, lending 
further support to the possibility of more complex inter- 
actions between DNA and ZFs. 

Magnesium inhibition 

Our data suggesting that binding might involve mechan- 
isms other than interaction with nucleotides in the major 
groove of DNA, as presently assumed [33], prompted us 
to test the effect of Mg 2+ ions, which are well known to 
interact with polyphosphates. We found that binding is 
strongly inhibited at concentrations above 1 mmol/1 and 



nearly completely inhibited by 10 mmol/1 (see Additional 
file 10, Figure S7). This inhibition was essentially identi- 
cal for both the PRDM9 Dom2 and PRDM9 Cst variants at 
all four hotspots tested. The possible involvement of the 
phosphate backbone of DNA in binding to PRDM9 
seems to be a particular characteristic of PRDM9 rather 
than a general characteristic of ZF proteins, as it is nota- 
bly different from the effects of Mg 2+ on DNA binding by 
the ZF proteins WT1 and EGR1 [29]' for those proteins, 
millimolar concentrations of Mg 2+ activated binding, and 
higher concentrations had little inhibitory effect [29]. 

Discussion 

Taken together, our results confirm that the binding seen 
with PRDM9 expressed in E. coli correctly recapitulates 
the biological specificities of PRDM9, including its allelic 
specificity, the haplotype specificity of its target, and the 
location of the binding sites near the peaks of hotspot 
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H3K4 trimethylation detected in vivo. Moreover, the cor- 
relation between the binding affinities of different Hlxl 
haplotypes and their genetically measured rates of cross- 
over also suggest that the strength of the physical binding 
of PRDM9 to hotspot sequences may determine the effi- 
ciency of recombination initiation. There was little 
sequence similarity between the several binding sites for 
the same PRDM9 variant, and the binding specificities of 
individual fingers appeared to be context-dependent. 
Mutational analysis of the Hlxl binding site indicates 
considerable variation between the ZFs in their contribu- 
tion to overall binding affinity and the probable involve- 
ment of the DNA phosphate backbone. 

These findings have relevance to the general role of 
ZF arrays as a DNA-recognition motif in biology, given 
that ZF proteins are the most common DNA regulatory 
protein in vertebrate genomes, comprising over 4% of 
all protein-coding genes in humans, and that half of 
these contain 10 or more fingers. They are almost as 
common in invertebrates, in which they were first 
described [17]. The issues surrounding the DNA binding 
specificities of ZFs are both biological and chemical. 
Biologically, we want to derive a consensus binding 
sequence whose parameters predict the location and 
relative binding affinity of genomic binding sites, and 
then describe how these affinities relate to biological 
outcomes. Chemically, we want to understand how the 
specificity of binding sequences is determined by the 
atomic and molecular interactions between protein and 
DNA. 

PRDM9 provides a particularly useful system for 
addressing these issues. It is highly variable, with multi- 
ple variants available both within and between species. 
It is a multiply fingered protein, providing the opportu- 
nity to examine interactions between repeat fingers in 
the same protein. In addition, there are many thousands 
of binding sites in genomes, providing a considerable 
source of experimental material. To our knowledge, the 
data we now report provide the first detailed definition 
of the DNA binding specificities of a ZF protein with a 
long (> 10) array of contiguous fingers. What has 
emerged is a picture of considerable complexity, one 
that raises as many or more questions than it answers. 

Binding-site positions 

It was somewhat surprising to find that PRDM9 binding 
sites are not uniformly positioned in relation to hotspot 
centers. One possible explanation for this asymmetry 
could be the presence of crossover refractory zones cre- 
ated by the nature of adjoining DNA sequences that cre- 
ate directionality in the spreading of the Holliday 
junctions away from initiation sites [30]. However, our 
genetic data point to a more likely alternative. The 
sequence of the Pbxl hotspot is identical in B6xCAST 



and B6xB6. CAST- IT crosses, but the crossovers in the 
former cross are resolved on both sides of the binding 
site, whereas the eight-fold larger number of crossovers 
in the latter cross are almost exclusively on one side of 
the binding site. In the absence of possible cis effects, 
this suggests a role for additional trans-acting factors in 
affecting the directional processing of recombination 
intermediates, with attendant consequences for overall 
crossover rates at this hotspot. 

Extent of ZF usage 

A notable feature of PRDM9 is the required use of all, 
or nearly all, of its contiguous fingers for binding DNA, 
as evidenced by the requirement for 35 to 37 bp for 
PRDM9 Dom2 , which has 12 fingers, and 30 to 33 bp+ for 
any binding activity by PRDM9 Cst , which has 11 fingers 
(the plus indicating that binding is further enhanced at 
the Hlxl hotspot by an additional 3 bp, one of which 
shows considerable nucleotide specificity). This long 
binding array presents a topological issue, as it seems 
that at least five to six fingers make base pair-specific 
contacts with DNA. Moreover, detailed analysis of the 
Hlxl binding site revealed that multiple nucleotide posi- 
tions between positions 5 and 21 show appreciable 
nucleotide specificity, without any gaps longer than 2 
bp. If, as in the case of other ZF proteins, these specific 
contacts form in the major groove of DNA, it suggests 
that the ZF domain of PRDM9 can bind continuously 
along the major groove for more than one turn of the 
DNA double helix. In the context of intact cellular 
DNA, this would require a snake-like winding of the 
DNA around its target, which is in marked contrast to 
the accepted rule that ZF proteins do not use more than 
three contiguous fingers to bind DNA [22], and thus 
requires further explication. Given the further observa- 
tion that binding is inhibited by magnesium, which 
complexes with polyphosphates, the requirement for 
additional fingers with low sequence specificity may 
indicate the importance of non-specific charge interac- 
tions between negatively charged phosphate groups and 
the positively charged Zn 2+ and histidine residues. 

Binding-site relatedness 

The three DNA binding sites identified for the 
PRDM9 Cst variant bear little ostensible resemblance to 
each other, with little identity between the nucleotide 
positions along the entire length of these sequences. 
However, a more subtle relationship between the three 
sequences can be detected when they are compared 
with the mutation-analysis data for Hlxl. This suggests 
that both the strongest binding within an Hlxl site and 
most of the sequence matches between the three bind- 
ing sites occur over the nucleotides binding to the first 
six fingers of the PRDM9 Cst ZF array. The binding rules 
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are obviously complex, and this complexity was empha- 
sized when we examined the Hlxl hotspot in detail; the 
mutation analysis showed variable stringency along its 
length for base-pair identities. At every position, two, 
three, and often all four bases could serve, albeit not 
always equally well. 

Context dependence of ZF specificity 

In addition to the weak similarity between binding 
sequences, there was also a strong context dependence 
of DNA binding, with the binding properties of a finger 
being dependent on its location in the array. Although 
the second, fifth, seventh, and ninth ZFs of PRDM9 Cst 
have identical amino acid sequences, they did not bind 
related trinucleotide sequences either within or between 
hotspots (see Additional file 5, Figure S5). 

Evolutionary context 

Previous three-dimensional studies of ZF DNA interac- 
tions suggest a model in which four amino acids in each 
finger (positions -1, +2, +3, and +6) bind four consecutive 
base pairs in DNA. The last of the four bases shares its 
binding with the next finger, giving a repeating pattern of 
3 bp per finger (plus 1 bp for the total possible length of 
the binding site) [25]. Computer analyses of multiple 3D 
structures indicate that all four amino acids are equally 
important [22]; however, the evolutionary data suggest 
otherwise for PRDM9. The amino acid composition of its 
ZFs undergoes rapid evolutionary selection to generate 
new hotspots as existing hotspots undergo continual loss 
by mutation [31]. Notably, position +2 is essentially 
invariant in the face of extraordinarily high replacement 
rates at positions -1, +3, and +6 [8,10,32,33], suggesting 
that, at least for PRDM9, this amino acid plays a lesser 
role in determining DNA binding specificity. 

Conclusions 

Our findings introduce considerable complexity into 
efforts to understand DNA binding by long-array ZF pro- 
teins such as PRDM9. The binding sites for the 
PRDM9 Cst variant are only very subtly related; identical 
ZFs show little constancy in the trinucleotides they bind, 
and a position weight matrix approach [27] failed in a 
search for a DNA binding motif common to genetically 
identified hotspots on mouse chromosomes 1 and 11 
[5,28]. Perhaps most surprising is the finding that binding 
requires DNA sequences involving more than one helical 
turn, even though the terminal nucleotide positions show 
little sequence specificity. To our knowledge, PRDM9 is 
the first long-array ZF protein to have its DNA binding 
specificity at separate binding sites compared in detail. At 
issue now are the biological question of how to solve the 
binding rules for PRDM9 and the chemical question of 
whether the binding complexities of PRDM9 are shared 



by the hundreds of other long-array ZF proteins that 
carry out a great diversity of biological functions. 

Materials and methods 

Cloning and expression of mouse Prdm9 

Full-length Prdm9 was amplified from cDNA prepared 
from the testes of CAST/EiJ (CAST) and C57BL/6J 
(B6) mouse strains using primers Prdm9F-HisB and 
Prdm9R-HisB (for sequences of primers used for clon- 
ing and sequencing, see Additional file 11) and cloned 
into pBAD/HisB (Invitrogen Corp., Carlsbad, CA, 
USA) using Xhol and HindUI restriction sites added 
to the 5'-ends of the primers. The identity of the 
cloned products was confirmed by sequencing (see 
Additional file 12). The plasmids were transformed 
into TOP 10 E. coli cells, allowing arabinose induction 
of protein synthesis. For production of PRDM9 pro- 
tein, the cells were grown overnight in Luria broth 
and then incubated an additional 4 hours with the 
addition of 0.02% arabinose. Total cell lysate was pre- 
pared in native binding buffer (50 mmol/1 sodium 
phosphate pH 8 and 500 mmol/1 NaCl) in accordance 
with the manufacturer's specifications, and stored at 
-80°C until use, when it was heated at 50°C for 15 
minutes to inactivate bacterial DNAses. The produc- 
tion of full-length protein was confirmed by western 
blotting using an antibody to the N-terminal his tag. 
Crude bacterial lysates used in the in vitro binding 
assays contained 15 to 30 [ig/ml expressed PRDM9 as 
estimated by ELISA. 

Histone methylation assay 

Crude bacterial extract containing induced/non-induced 
PRDM9 variants or empty pBAD/HisB vector were 
heated for 15 minutes at 50°C, then used for a histone 
methyltransferase assay. The reaction mixture contained 
20 ul 2x HMT buffer (50 mmol/1 Tris-HCl pH 8 and 
20% glycerol), 2 \i\ S-adenosinmethionine (100 u.g/u.1; 
Sigma-Aldrich, St Louis, MO, USA), 10 \i\ bacterial 
extract, 0.2 \A histone substrate (0.5 [ig/[il recombinant 
H3K4-me2; Active Motif, Carlsbad, CA, USA), and 7.8 
\A water. The reaction was kept at 34°C for 30 minutes. 

Several commercially available antibodies were used to 
detect specific H3K4me3 signals. All tested antibodies 
showed cross-reactivity with H3K4me2. Therefore, we 
estimated the level of H3K4 trimethylation by induced 
PRDM9 relative to controls with induced empty vector. 
The results presented here were obtained using the anti- 
body that showed the most consistent discrimination 
between H3K4me2 and H3K4me3 signals (recombinant 
H3K4-me2; Active Motif). 

Histone methyltransferase activity was measured by 
dot blotting. The reaction mixtures were blotted onto 
nitrocellulose membrane and the trimethylated H3K4 
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was detected with anti-H3K4me3 antibodies (Active 
Motif). The chemiluminiscent signal was visualized 
(Lumiglo Kit; Cell Signalling Technology, Danvers, MA, 
USA) and scanned with an image scanner (ImageQuant. 
LAS-4000; GE Healthcare, Princeton, NJ, USA) The 
fraction of the specific H3K4me3 signal was calculated 
as 

FH3K4me3 = (Si n d — S un )/ (SH3K4me3 — SH3K4me2)/ 

where F H 3K4me3 is the fraction of the specific signal, 
S ind and S un are the signals of induced and uninduced 
extracts, and S H 3K4me3 an d S H 3K4me2 are the signals of 
commercial H3K4me3 and H3K4me2, respectively. 

Electrophoretic mobility shift assay 

Using an electrophoretic mobility shift assay (EMSA) 
(LightShift Chemiluminescent EMSA Kit; Pierce Bio- 
technology ThermoScientific, Rockford, IL, USA), a bio- 
tin-labeled test oligo and PRDM9 were allowed to bind 
in a mixture containing 10 mmol/1 Tris (pH 7.5), 50 
mmol/1 KC1, 1 mmol/1 dithiothreitol, 50 ng/|il poly(dI/ 
dC), 0.05% NP-40, 0.5 pmol/1 of labeled double-stranded 
oligo, and 3 \i\ of bacterial extract in a total volume of 
20 ul. When necessary, 10 pmol/1 of cold competitor or 
non-competitor was added. The reaction mixture was 
incubated for 40 minutes and separated on 5% PAA gel 
in 0.5 x Tris-borate-EDTA (TBE) buffer, which was pre- 
electrophoresed for 30 minutes. After electrophoresis, 
the separated products were transferred onto nylon 
membrane by wet transfer in 0.5 x TBE at 380 mA for 1 
hour. The membrane was cross-linked by UV light at 
120 mj/cm 2 (UV Stratalinker 2400; Agilent Technolo- 
gies, Santa Clara, CA, USA). Biotin-labeled products 
were detected by the chemiluminescence signal pro- 
duced by the binding to streptavidin-peroxidase conju- 
gate in accordance with the manufacturers specification, 
and the chemiluminesce signal was recorded (G:BOX 
system; Syngene, Frederick, MD, USA). 

Single-stranded oligos were biotin-labeled at the 3'- 
end *Biotin 3' End DNA Labeling Kit; Pierce Biotechnol- 
ogy ThermoScientific) in accordance with the manufac- 
turer's specifications. Complementary oligos were then 
mixed, denatured at 95°C for 2 minutes, and annealed 
by cooling down to room temperature in a water bath. 
When a PCR product was labeled, the two strands were 
separated by heating at 95°C for 5 minutes. The tube 
was immediately transferred to ice, and then the pro- 
duct was labeled and re-annealed as above. 

The bands were quantified using Gene Tools software 
(Syngene, Frederick, MD USA), using the 'Manual 
bands' and 'Absorption' settings. Equal-sized rectangles 
were drawn around each shifted and unshifted band 
present, and the absorption density was scored 



automatically by the software. The proportion of shifted 
band was calculated as S/(S+U), where S is the density 
of the shifted band and U is the density of the unshifted 
band. 

The DNA binding sites of hotspots were localized by 
testing whether the electrophoretic mobility of biotin 
end-labeled DNA sequences was altered after binding to 
the PRDM9 Cst and PRDM9 Dom2 proteins expressed in R 
coli. All binding experiments were performed using 
crude E. coli extracts, because every attempt to purify 
PRDM9 resulted in insolubility and loss of activity. The 
specificity of binding was confirmed by appropriate con- 
trols, including induced cells that had been transformed 
with empty vector, and the host cells alone. Addition- 
ally, the variant-specific binding of PRDM9 to hotspot 
sequences was confirmed by isolation of protein-biotiny- 
lated DNA complexes onto streptavidin beads and 
detection of PRDM9 by western blotting with antibodies 
against its C-terminal end (see Additional file 13, Figure 
S8). 

To localize binding sites, a tiling approach was used, 
in which PCR-amplified fragments of 200 to 400 bp, 
overlapping by 50 bp, were tested for their ability to 
bind PRDM9 variants (Figures 1A, 2A, 3A, D; also see 
Additional file 2, Figure S2B; Additional file 2, Figure 
S3 A; Additional file 2, Figure S4A). Positive fragments 
were then reduced to less than 100 bp using the same 
strategy. Because biotin labeling close to the end of a 
binding-site sequence influences binding, the limiting 
sequence for each binding site was then determined by 
comparing the binding ability of progressively shorter, 
unlabeled oligos against the binding ability of longer, 
labeled oligos. 

Mutational analysis 

For the competition assays testing how single-nucleotide 
substitutions affect DNA-PRDM9 binding, bacterial 
extract was incubated with 10 pmol/1 of unlabeled dou- 
ble-stranded oligo for 1 hour (all other components 
except labeled oligo were as given above), then 0.5 
pmol/1 labeled oligo was added, and the extract was 
incubated for an additional 4 hours (see Additional file 
8, Figure S6A). Under these conditions, appearance of a 
strong, shifted, labeled DNA band indicates that the 
mutated sequence shows weak or no binding, whereas a 
weakly shifted band indicates strong binding by the 
mutated oligo (see Additional file 5, Figure S5B). The 
separation and detection conditions were the same as 
described above. The strength of binding of each 
mutated oligo to PRDM9 Cst was then determined by 
comparing its ability to replace previously bound, 
labeled, Hlxl oligo with the abilities of the unlabeled 
oligo and an unlabeled oligo with a randomly scrambled 
Hlxl sequence. Competition was calculated as: 
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Bmut - (Sun Smut)/ (Sscr Sun)/ 

where B mut is the relative change in binding strength 
of the mutated oligo, and S un , S mut , and S scr are the pro- 
portions of shifted bands of the unmutated, mutated, 
and scrambled oligos, respectively. 

Streptavidin pull-down and western blotting 

Protein-biotinylated DNA complexes were isolated on 
streptavidin beads. PRDM9 variants were expressed in 
E. coli and bound to biotinylated oligos representing bind- 
ing sites, as described above for the EMSA experiments. 
The complexes were then isolated on Tl streptavidin- 
coated beads (Dynabeads; Invitrogen Dynal, Oslo, Norway) 
and washed three times with EMSA binding buffer, then 
the bound proteins were released from the beads by add- 
ing 0.1% SDS. The proteins were separated on 4 to 20% 
SDS gels and transferred onto polyvinylidene difluoride 
(PVDF) membrane, and the specificity of PRDM9 binding 
was detected on western blots using mouse antibodies 
against the C-terminal part of PRDM9. 

Chromatin immunoprecipitation for H3K4-me3 

Crude isolation of spermatocytes from B6 and B6xCAST 
Fl juvenile mice was performed as reported previously 
[34] with minor modifications. 

Spermatocytes were isolated from testes of 12-dpp 
juvenile mice. Cross-linking of spermatocytes was per- 
formed by addition of formaldehyde to a final concen- 
tration of 1% and incubation at room temperature for 
10 minutes with constant rotation. Cross-linking was 
stopped by dropwise addition of glycine to a final con- 
centration of 125 mmol/1, followed by incubation with 
rotation for 5 minutes at room temperature. Cells were 
washed twice, separated by centrifugation at 2000 g for 
5 minutes at 4°C, and resuspended in 1 ml PBS. After 
the final wash, hypotonic lysis buffer (10 mmol/1 Tris- 
HCL pH 8.0, 1 mmol/1 KC1, 1.5 mmol/1 MgQ 2 ) was 
added at a concentration of 5 x 10 6 cells/ml supplemen- 
ted with 1 mmol/1 phenylmethanesulfonylfluoride 
(PMSF) and lx protease inhibitor cocktail (PIC; Sigma- 
Aldrich) and incubated for 30 minutes at 4°C with rota- 
tion to shear the cellular membrane. Nuclei were pel- 
leted by centrifugation at 10,000 g for 10 minutes and 
resuspended in MNase buffer (50 mmol/1 Tris, 1 mmol/ 
1 CaCl, 4 mmol/1 MgCl, 4% NP-40) at 26 [il/10 6 cells, 
supplemented with 1 mmol/1 PMSF and lx PIC. Chro- 
matin was fragmented and solubilized by addition of 
MNase (USB, Cleveland, OH, USA) at 15 U per 5 x 10 6 
cells, followed by incubation for 2 minutes at 37°C. 
Nuclease activity was stopped by addition of EDTA to a 
final concentration of 10 mmol/1 and incubation at 4°C 
for 5 minutes. Soluble chromatin was clarified by 



centrifugation at 4°C for 10 minutes at top speed, then 
the supernatant was transferred to a clean tube and the 
centrifugation repeated. H3K4me3 antibody (Millipore 
Corp., Billerica, MA, USA) was prebound to 20 [d pro- 
tein-G beads (Dynabeads; Invitrogen Corp.) following 
the manufacturer's protocol. Antibody-bound beads 
were washed twice with 100 [i\ of immunoprecipitation 
(IP) buffer (RIPA 50 mmol/1 Tris pH 8.0, 150 mmol/1 
NaCl, 1.0% NP-40, 0.5% Na deoxycholate, 0.1% SDS 
supplemented with 50 mg/ml BSA, and 0.5 mg/ml sal- 
mon-sperm DNA) and resuspended in a final volume of 
75 |Lil IP buffer with PMSF and lx PIC. Undiluted chro- 
matin (25 |Lil; around 10 6 cell equivalents) was added to 
the beads, and incubated with rotation at 4°C for 
2 hours. The chromatin-bound beads were washed three 
times with 100 \i\ IP buffer and then twice with 100 \i\ 
TE buffer pH 8.0 before elution with 125 [A elution buf- 
fer (1% SDS, 20 mmol/1 Tris-HCL pH 8.0, 200 mmol/1 
NaCl, 5 mmol/1 EDTA) supplemented with 50 (ig/ml 
Proteinase K (Sigma-Aldrich). Elution was carried out 
by incubation at 68°C for 2 hours with vigorous shaking 
at top speed in a thermal mixer (Thermomixer; Eppen- 
dorf, Westbury, NY, USA) to reverse cross-links and 
digest all proteins. An equal aliquot (25 (il of 'input' 
chromatin was diluted to a final volume of 125 (il with 
elution buffer and handled in parallel. DNA was recov- 
ered from the beads using magnetic separation, placed 
into a clean tube, and then purified using commercially 
available methods following the manufacturer's protocol 
for PCR clean up (Qiagen Inc., Valencia, CA, USA). 
Purified DNA was eluted in a total volume of 200 \i\ 
10 mmol/1 Tris pH 8.0. 

Primers were designed using OligoPerfect™ primer 
design software (Life Technologies, Corp., Carlsbad, CA, 
USA) using the following parameters, 40 to 60% GC, 57 
to 63°C Tm, with a product size of 8 to 120 bp. Each 
qPCR reaction was performed in triplicate 20- [d reac- 
tions using a commercial kit (Quantifast SYBR Green 
PCR Kit; Qiagen Inc.) and following the manufacturers 
protocol,, then run on a real-time PCR system) (Master- 
Cycler® ep realplex; Eppendorf) for 40 cycles, followed 
by a melting-curve analysis. Ct values were calculated 
using an automated threshold and averaged for triplicate 
experiments. If a PCR reaction failed to amplify in the 
IgG control, the Ct value for that reaction was arbitra- 
rily set to 35 cycles, typically around 12 cycles less than 
the input sample (or less than 0.024%). Reactions for 
both the input and ChIP sample were seeded with 2 \i\ 
of DNA. The percentage of chromatin-bound (fraction 
bound) was calculated by: 

2 (Cti n p U t - Ctchip) x 100. 
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Bioinformatic search for a DNA binding motif 

A position weight matrix (PWM) was created (Table S2) 
based on the mutational analysis of the Hlxl binding 
site, and a matrix matching algorithm from the software 
package Motif Occurrence Detection Suite [27] was 
used. The weights obtained for the matrix were opti- 
mized by maximizing the score for the detection of the 
Hlxl binding site within a scan against the entire Hlxl 
hotspot region. Subsequently, the PWM was used in a 
scan against all hotspots less than 10 Kb in size located 
on chromosome 1, along with control regions adjacently 
positioned before and after each hotspot; each control 
region was sized equivalently to its corresponding hot- 
spot. To minimize the effects of multiple testing, only 
matches with P-values below 10" 5 were considered. 



specific points in the paper, and the sequences of the oligos used for 
mapping. 

Additional file 11: Additional experimental procedures: oligos used 
for analysis of binding. The Additional material contains maps of all 
hotspots studied in this paper, their sequences, additional figures and 
tables highlighting specific points in the paper, and the sequences of the 
oligos used for mapping. 

Additional file 12: Sequences of Prdm9 Dom2 and Prdm9 Cst cDNA 
cloned in pBAD/HisB. The Additional material contains maps of all 
hotspots studied in this paper, their sequences, additional figures and 
tables highlighting specific points in the paper, and the sequences of the 
oligos used for mapping. 

Additional file 13: Figure S8. PRDM9 binds to its targets in an 
allele-specific manner. The Additional material contains maps of all 
hotspots studied in this paper, their sequences, additional figures and 
tables highlighting specific points in the paper, and the sequences of the 
oligos used for mapping. 



Additional material 



Additional file 1: Figure SI. Escherichia co//-expressed PRDM9 
protein variants retain their H3K4 trimethylation activity. The 

Additional material contains maps of all hotspots studied in this paper, 
their sequences, additional figures and tables highlighting specific points 
in the paper, and the sequences of the oligos used for mapping. 

Additional file 2: Figure S2. The PRDM9 Dom2 binding site of Pbxl 

The Additional material contains maps of all hotspots studied in this 
paper, their sequences, additional figures and tables highlighting specific 
points in the paper, and the sequences of the oligos used for mapping. 

Additional file 3: Figure S3. The PRDM9 Cst binding site of Hlxl. The 

Additional material contains maps of all hotspots studied in this paper, 
their sequences, additional figures and tables highlighting specific points 
in the paper, and the sequences of the oligos used for mapping. 
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